Alistair Martin
26/06/2017
data$meta[type=="plasma",.N,patient][,table(N)]## N
## 3 4 5 6 7 8 9 11 12 15 17
## 2 7 6 1 4 5 1 1 1 1 1
\[Zi = (\frac1i + \frac1{n − i})^{−1/2}(\frac{S_i}{i} − \frac{S_n − S_i}{n − i}).\] - Let \(Z_B = max(|Z_i|)\) for \(1<i<N\). - The null hypothesis of no change is rejected if the statistic exceeds the upper αth quantile of the null distribution of ZB
## Warning: Removed 3 rows containing missing values (geom_point).
## Warning: Removed 1108 rows containing missing values (geom_point).
## Warning: Removed 473 rows containing missing values (geom_path).
We need a measure of the quality of each sample. However, tradiational measures, such as the variance, are affected by CNAs. The QDNA package outputs (top right) the trimmed (0.01%) variance, however, this also appears to be affected by the presence of CNAs. Within the paper, they suggest the first order difference is a better measure of the noise.
\[Noise = median(|x_{i+1}-x_i|);~1<=i<(N-1)\]
Before we start an indepth analysis of the segmentation, lets just confirm that there are more segments found within FFPE/FF/plasma than BC samples and that they segmentation count is independent of the noise.
\[Area = \sum^N_i(|S_i|) \forall i\]
## Warning: Removed 111 rows containing missing values (geom_point).
## Warning: Removed 38 rows containing missing values (geom_path).
x <- merge(absolute$meta[,.(Sample.name,call.status)],data$meta,all.y = T)
table(x[,.(type,call.status)],useNA = "ifany")## call.status
## type called high entropy high non-clonal <NA>
## BC 3 0 0 24
## FFPE 32 6 0 0
## Frozen 4 0 0 0
## plasma 109 3 1 89
## Warning in melt.data.table(x, id.vars = c("type", "call.status"),
## measure.vars = c("area.seg", : 'measure.vars' [area.seg, n.seg, noise]
## are not all of the same type. By order of hierarchy, the molten data value
## column will be of type 'double'. All measure variables not of type 'double'
## will be coerced to. Check DETAILS in ?melt.data.table for more on coercion.
## Warning in melt.data.table(x[!is.na(cycle)], id.vars = c("patient",
## "cycle", : 'measure.vars' [RECIST, purity, area.seg, n.seg, noise] are not
## all of the same type. By order of hierarchy, the molten data value column
## will be of type 'double'. All measure variables not of type 'double' will
## be coerced to. Check DETAILS in ?melt.data.table for more on coercion.
## Warning: Removed 35 rows containing missing values (geom_point).
outside link
sessionInfo()## R version 3.3.2 (2016-10-31)
## Platform: x86_64-apple-darwin16.1.0 (64-bit)
## Running under: macOS Sierra 10.12.3
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] stringr_1.2.0 data.table_1.10.4 ggthemes_3.4.0 ggplot2_2.2.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.11 knitr_1.16 magrittr_1.5 munsell_0.4.3
## [5] colorspace_1.3-2 rlang_0.1.1 plyr_1.8.4 tools_3.3.2
## [9] grid_3.3.2 gtable_0.2.0 htmltools_0.3.6 yaml_2.1.14
## [13] lazyeval_0.2.0 rprojroot_1.2 digest_0.6.12 assertthat_0.2.0
## [17] tibble_1.3.3 reshape2_1.4.2 evaluate_0.10 rmarkdown_1.6
## [21] labeling_0.3 stringi_1.1.5 scales_0.4.1 backports_1.1.0